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A new numerical algorithm for solving the symmetric eigenvalue problem is presented. The 
technique deviates fundamentally from the traditional Krylov subspace iteration based techniques 
(Arnoldi and Lanczos algorithms) or other Davidson- Jacobi techniques, and takes its inspiration 
from the contour integration and density matrix representation in quantum mechanics. It will be 
shown that this new algorithm - named FEAST - exhibits high efficiency, robustness, accuracy 
and scalability on parallel architectures. Examples from electronic structure calculations of Carbon 
nanotubes (CNT) are presented, and numerical performances and capabilities are discussed. 
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I. INTRODUCTION 

The generalized eigenvalue problem, that is the de- 
termination of nontrivial solutions (A, x) of Ax = ABx 
with A and B square matrices, is a central topic in nu- 
merical linear algebra and arises from a wide range of 
applications in sciences and engineering. In electronic 
structure calculations, in particular, the eigenvalue prob- 
lem is one of the most challenging applied numerical 
process - also called diagonalization procedure or spec- 
tral decomposition. In these calculations, the electron 
density can be formally calculated by summation of the 
amplitude square of the wave functions \t m solution of 
the Schrodinger-like eigenvalue problem Hf m = E m S>l/ m 
with different discrete energies E m (where H represents 
the Hamiltonian Hermitian matrix and S is a symmet- 
ric positive matrix obtained using non-orthogonal basis 
functions). This procedure can be quite computationally 
challenging for large-scale simulations of systems contain- 
ing more than a hundred of atoms and/or where a large 
number of eigenpairs (E m ,ip m ) are needed. Progress in 
electronic structure calculations as for other large-scale 
modern applications, are then much likely dependent on 
advances in diagonalization methods. 

In the past decades, the eigenvalue problem has led 
to many challenging numerical questions and a central 
problem [l[ : how can we compute eigenvalues and eigen- 
vectors in an efficient manner and how accurate are they? 
Powerful tools have then been developed from Jacobi 
method and power iterations, to iterative Krylov sub- 
space techniques (including Arnoldi, and Lanczos meth- 
ods), or other Davidson- Jacobi techniques Q. Tradi- 
tional numerical algorithms and library packages are yet 
facing new challenges for addressing the current large- 
scale simulations needs for ever higher level of efficiency, 
accuracy and scalability in modern parallel architectures. 

This article presents a new robust and scalable algo- 
rithm design for solving the eigenvalue problem- named 
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FEAST- which deviates fundamentally from the tradi- 
tional techniques above, and takes its inspiration from 
the density matrix representation and contour integra- 
tion in quantum mechanics. Section II summarizes the 
electronic structure and contour integration problems 
which have motivated the development of the new al- 
gorithm. The FEAST algorithm is then described in de- 
tail in Section III, and numerical examples and perfor- 
mance results arc presented in Section IV. Finally, Sec- 
tion V presents some discussions regarding the efficiency, 
robustness and scalability of the algorithm. 



II. THE CONTOUR INTEGRATION 
TECHNIQUE IN ELECTRONIC STRUCTURE 
CALCULATIONS 

Although new fast sparse solvers have allowed consid- 
erable time saving for obtaining the eigenpairs (E m , $ m ) 
in electronic structure calculations, such as the Rayleigh- 
quotient multigrid Q developed for the MIKA package, 
or the parallel Chebyshev subspace iteration technique 
developed for the PARSEC package Hi], these calcula- 
tions are still considered computationally extremely chal- 
lenging and linear scalability is not easily achievable. 

An alternative approach to the Schrodingcr picture 
for obtaining the electron density consists in perform- 
ing a contour integration of the diagonal elements of 
the Green's function matrix G(Z) = (ZS — H) _1 over the 
complex energy space @. At zero temperature, the re- 
sulting expression for the electron density in real-space 
is: 

n(r) = --^ / dZG(r,r,Z)=]T|vMr)| 2 , (1) 

Z7TZ In — 
JC m 

where the complex contour C includes all the eigenval- 
ues E m below the Fermi level Ep, and where the spin 
factor is not considered. It should be noted that at non 
zero temperatures, this expression would also include the 
contribution of the residues of all poles of the Fermi- 
Dirac distribution function on the imaginary axis at the 
position of the Fermi level Q ■ For transport problems 
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and open systems, in turn, the contour integration is of- 
ten used to compute the equilibrium part of the electron 
density [1[ where self-energy boundary conditions need to 
be included in the Hamiltonian matrix H. The contour 
integration technique represents a priori an attractive al- 
ternative approach to the traditional eigenvalue problem 
for computing the electron density since the number of 
Green's function to be calculated -typically ~ O(10) us- 
ing a Gauss quadrature procedure- is independent of the 
size of the system. In particular, an efficient linear scal- 
ing strategy CMB (CMB stands for Contour integration 
- Mode approach - Banded system solver), has been pro- 
posed in (jj [l(| for simulating nanowire-type structures 
within a real-space mesh framework while overcoming the 
impossible numerical task of inverting a large size matrix 
at each point of the contour. For arbitrary systems (i.e. 
beyond nanowire structures), however, there are no nu- 
merical advantages of abandoning the traditional eigen- 
value problem in favor of the contour integration tech- 
nique for computing the electron density. In addition, it 
is clear from equation ([T]) that the contour integration 
technique does not provide a natural route for obtaining 
the individual eigenvectors but rather the summation of 
their amplitudes square. In the following section, a new 
numerical algorithm design FEAST is proposed for ob- 
taining directly the eigenpairs solutions using the density 
matrix representation and a numerically efficient contour 
integration technique. 

III. FEAST 

A. Introduction 

In this section, a new algorithm is presented for solving 
generalized eigenvalue problems of this form 

Ax = ABx, (2) 

within a given interval [A m i n , A max ], where A is real sym- 
metric or Hcrmitian and B is a symmetric positive def- 
inite (s.p.d). One common way to accelerate the con- 
vergence rate of traditional iterative techniques consists 
in performing a factorization of the Green's function 
G(cr) = (crB — A) -1 for some reasonable shift a close 
to the eigenvalues in the search interval and which leads 
to solving linear systems (i.e. shifting strategy). More 
recently, Sakurai et al. [ill, 03 have proposed a root 
finding technique which consists of a contour integration 
of a projected Laurent series-type decomposition of the 
Green's function. In principle, a set of complex moments 
can be obtained by solving few linear systems along the 
contour, which can generate an identical subspace to the 
one spanned by the eigenvectors present inside the con- 
tour. In practice, however, robustness and accuracy are 
not easily achievable. In our approach, we avoid decom- 
posing directly the Green's function and perform instead 
an exact mathematical factorization of its contour inte- 
gration - which represents the reduced density matrix p 



in quantum mechanics. One can show that this factoriza- 
tion can be expressed in terms of the eigenvectors present 
inside the contour as follows: 



p = -— / dZ G(Z) == V |x m )(x m |. (3) 

Z7TZ fn 

In matrix notations the second term of the equation reads 
XX T where X NxM = {xi,X2, ..xm} (M being the num- 
ber of eigenvalue inside the contour and N the size of 
G). It should be noted that the diagonal elements of p 
represent the electron density in quantum mechanics ([1]) 
discussed in Section [TTJ 

Postmultiplying p by a set of M linearly independent ran- 
dom vectors Y NxM = {yi, .Jm}, the first expres- 
sion in ([3]) leads to a new set of M independent vectors 
Qnxm = Q2, -qivi} obtained by solving linear sys- 
tems along the contour 

Qnxm = ~ [ dZ G(Z)Y NxM , (4) 

J c 

while the second expression in ([3]), implies these vectors 
Q can be formally generated by the eigenfunctions X 
inside the contour 

Qnxm =X NxM W MxM With \Y.., xi'vj. (5) 

In other words, each Q column vector obtained in (U|) 
represents a different linear combination of unknown ba- 
sis functions X in ([5]). Using a Rayleigh-Ritz proce- 
dure, the problem ^ is now equivalent to computing the 
eigenpairs (e m , 3> m ) of the following reduced generalized 
eigenvalue problem of size M: 

A Q * = eB Q * (6) 
with A Q = Q T AQ and B Q = Q T BQ. (7) 

The Ritz values and vectors are then given by: 

\ m = e m , m = l, ...,M (8) 

-^NxM QnxM^MxM (9) 

where 3? MxM = {3>i, 3*2, --*m}- One can show that the 
obtained eigenvectors X are naturally B-orthonormal i.e. 
x^Bxj = <5ij, if the eigenvectors of the reduced problem 
(J6j) are BQ-orthonormal i.e. ^^Bq^j = <5ij. 



B. Practical Considerations and Pseudocode 

In practice, the vectors Q are computed by performing 
a numerical integration of each vectors G(Z)Y ((4]) along 
the complex contour C. Let us consider a circle centered 
in the middle of the search interval [A min , A max ] , it should 
be noted that the expression of the contour integration 
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FIG. 1: Schematic representation of the complex contour inte- 
gral defined by the positive half circle C + . In practice, the vec- 
tors Q are computed via a numerical integration (e.g. Gauss- 
Legendre quadrature) where only few linear systems G(Z)Y 
needs to be solved at specific points Z c along the contour. 



can be further simplified since G(Z) = G^(Z). Denot- 
ing C + the positive half circle of the complex contour, it 
comes if A is Hermitian: 

P=-T~J dZ{G(Z)-Gt(Z)}, (10) 

^7T« JC+ 

and if A is real symmetric: 
1 



C+ 



dZ 3{G(Z)}, 



(11) 



where stands for the imaginary part. Using a No- 
point Gauss-Legendre quadrature on the positive half cir- 
cle C + (see Figured]), with x e the e th Gauss node asso- 
ciated with the weight uj e , one obtains if A is Hermitian 
and Y,Q G<C NxM : 

n c 

Q = -^2-cJ e r(exp{i9 e ) G(Z e ) +exp(-t0 e ) Gt(Z e ))Y, 

e=l 

(12) 



with 



^mit 



z R = 



-, 6 e = -(n/2)(x e - 1), (13) 
A mM c + A min +rCTp(tg ^ (w) 



If A is real symmetric, Y, Q € K N >< M anc i one can use: 
Q = J2 o ex P(^) G ( z e) Y}> (15) 



(ii) Postmultiplying the density matrix ([3]) by Mo ran- 
dom vectors (rather than M) where Mo is greater than 
M. The reduced dense generalized eigenvalue problem 
of size Mo can be solved using standard eigenvalue 
LAPACK routines [l3|. Since we do not perform the or- 
thogonalization of the vectors Q, one has to make sure 
that Bq Mo Mq is symmetric positive definite i.e. Mo does 
not exceed an upper limit which can easily be obtained 
a posteriori. 



(NxMj 

^ itJ A NxM ^- 

2- Set Q = with Q G K NxM °; r = (A max - A min )/2; 

For e = 1, . . .N e 

compute 6 e = — (7r/2)(x c — 1), 
compute Z c = (A max + A min )/2 + r exp(i6> e ), 
solve (Z C B - A)Q e = Y to obtain Q e G C NxM ° 
compute Q = Q — (o; e /2)JR{r exp(i8 e ) Q e } 

End 



3- Form A c 



Q T AQ and Bq = Q BQ 



Q M X M c 

4- Solve Aq$ = eBq$ to obtain the Mo eigenvalue e m , 



and eigenvectors 4> 



Mq X Mq 



• M xM 



5- Set A m = e m and compute X 



NxM 



= Q 



NxMq Mq XMq 



If A m G [A m i n , A max ], A m is an eigenvalue solution 
and its eigenvector is X m (the m th column of X). 
6- Check convergence for the trace of the eigenvalues A m 

If iterative refinement is needed, compute Y = BX 

and go back to step 2 



FIG. 2: FEAST pseudocode (sequential version) for solving 
the generalized eigenvalue problem Ax = ABx, where A is 
real symmetric and B is s.p.d., and obtaining all the M eigen- 
pairs within a given interval [A m i n , A ma x] • The numerical inte- 
gration is performed using N c -point Gauss-Legendre quadra- 
ture with x e the e th Gauss node associated with the weight 
u; e . For the case N e = 8, one can use: 
{xi,wi) = (0.183434642495649,0.362683783378361), 
(x 3 ,u 3 ) = (0.525532409916328,0.313706645877887), 
ix B ,u> B ) = (0.796666477413626,0.222381034453374), 
(x 7 , u 7 ) = (0.960289856497536,0.101228536290376), 
and (X2i, W2i) i=1 . .. 4 = (—x 2 i-i,co2i-i) 



where stands for the real part. 

In order to reduce the numerical quadrature error of 
the contour integral, one may consider the two following 
improvements: 

(i) Performing outer-iterative refinement steps. Once the 
eigenvectors X are obtained © , a new set of initial guess 
vectors Y = BX can be used. Postmultiplying the den- 
sity matrix ([3]) by Y, one now obtains from ([5]) that Q 
converges to X since X T BX = I (i.e. W.y = S^j and 
then pBX = X). A fast test for convergence can be 
obtained by checking the trace of the eigenvalues ijHJl . 



The performances of the basic FEAST algorithm will 
then depend on a trade off between the choices of the 
number of Gauss quadrature points N e , the size of the 
subspace Mo, and the number of outer refinement loops. 
So far, using Mo > 1.5M, N = 8, and with at most 2 
refinement loops, we have consistently obtained a rela- 
tive residual equal or smaller than 10 -10 seeking up to 
1000 eigenpairs for a variety of problems. The basic 
pseudocode for the FEAST algorithm is given in Fig- 
ure in the case of A real symmetric. In the case of 
A complex Hermitian, we note the following changes: 
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Y,Q 6 C NxMo , * S C MoXMo , and the construction of 
the vectors Q in step-2 of the pseudocode must be mod- 
ified to satisfy (fT2|) . 



IV. NUMERICAL EXPERIMENTS 

In this section, we propose to demonstrate the numer- 
ical stability, robustness and scalability of the FEAST 
algorithm using three examples derived from electronic 
structure calculations of Carbon nanotube (CNT). 

A. Example I 

Let us first consider a family of eigenvalue problems, 
Test-CNT, obtained using a 2D FEM discretization of the 
DFT/Kohn-Sham equations at a given cross-section of a 
(13,0) CNT - the 2D atomistic potential is derived from 
the mode approach used in the CMB strategy for solving 
the full 3D problem presented in Q . In Test-CNT, A is 
real symmetric, and B is s.p.d., the size of both matrices 
is N = 12, 450 and their sparsity pattern is identical with 
a number of non-zero elements nnz = 86, 808. 

In Table U we report the times and relative residual 
obtained by the public domain eigenvalue solver pack- 
age ARPACK [1J| (using the shift-invert strategy) and 
the FEAST algorithm presented in Figure [2] for solving 
the Test-CNT example seeking up to M = 800 (low- 
est) eigenpairs. The inner linear systems in ARPACK 
and FEAST are solved using the shared-memory parallel 
direct solver PARDISO [15]. It should be noted, how- 
ever, that FEAST benefits more than ARPACK from the 
PARDISO solver, as the inner linear systems have mul- 
tiple right-hand sides. Although both algorithms could 
benefit from a parallel distributed implementation (e.g. 
using the P_ARPACK package), the simulation runs are 
here restricted to a given node of a 8-cores Intel Clover- 
town system (16Gb,2.66GHz) where the linear systems 
in FEAST are factorized and solved one after another. 
The performances of ARPACK and FEAST can also de- 
pend on fine tunings parameters such as the choices of the 
size of the subspace M (M = 1.5M here for both algo- 
rithms), the inner systems solvers, the number of contour 
points N c for FEAST, or the stopping criteria for obtain- 
ing the residual. The simulation results in this section 
are then not intended to compare quantitatively the two 
solvers but rather to point out the potentialities of the 
FEAST algorithm. 

In our experiments, the convergence criteria on the rel- 
ative residual for FEAST is obtained when the relative 
error on the trace of the eigenvalues A m in the search 
interval is smaller or equal to 10~ 13 . Table HT1 shows the 
variation of the relative error on the trace with the num- 
ber of outer-iterative refinement for FEAST. These re- 
sults demonstrate that only 2 to 3 refinement loops are 
necessary to obtain the small relative residuals for the 
different cases reported in Table |TJ It should be noted 
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TABLE I: Simulations times and relative residual 
maxi(||Axi — AiBxi||i/||Axi||i), obtained by the solver 
ARPACK and FEAST on the TEST-CNT system seeking 
M (lowest) eigenpairs for different search intervals. The 
simulations are performed on a Intel Clovertown (8cores, 1 
node, 2.66GHz, 16Gb). The shift-strategy has been used in 
ARPACK to accelerate the convergence (the regular mode 
would give ~ 300s for M = 100). The inner linear systems 
in ARPACK and FEAST are both solved using the direct 
parallel solver PARDISO [H on 8 cores. Finally, the size 
of the subspace has been chosen to be Mo = 1.5M for both 
algorithms, and the number of contour points for FEAST is 
fixed at N c = 8. 

that only one loop is necessary to obtain the eigenvalues 
with an accuracy of ~ 10~ 5 or below. 
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TABLE II: Variation of the relative error on the trace of 
the eigenvalues Yl m ^ m f° r different search intervals with the 
number of iterative refinement loops. The convergence cri- 
teria is set to 10~ 13 where the final relative residual on the 
eigenpairs is reported in Table [I] 

The simulation results in Table U demonstrate very 
good scalability for FEAST while the search interval 
keeps increasing but the number of contour points N e 
stays identical (i.e. the number of numerical operations 
stays the same for a given loop of FEAST with a fixed 
N c = 8 linear systems to solve). In addition, from Ta- 
ble IIII1 one can see how the robustness of the FEAST 
algorithm is affected while the number of contour points 
N e changes. In particular, N — 4 points along the con- 
tour did suffice to capture M = 100 eigenpairs with a 
relatively small residual (decreasing the simulation time 
reported in Table Q] for this case), while the case N = 16 
points generated a residual smaller than the one obtained 
by ARPACK (using M = 1.5M). 

B. Example II 

In another set of numerical experiments, we intend to 
demonstrate the robustness of FEAST in capturing the 
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TEST-CNT 
M = 100 


FEAST 
Time(s) Resid. # loops 


N c = 4 
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TABLE III: Performance results obtained by FEAST seeking 
M = 100 eigenpairs for different values of N c . The conver- 
gence is obtained when the error on the trace is equal or 
smaller to 10~ 13 . 



multiplicity of the eigenvalues. We propose to create ar- 
tificially new TEST-CNT systems called fc(N,M) where 
the matrices A and B are repeated k times along the 
main diagonal (the new system matrix is block diagonal 
with k blocks). Physically these systems can describe 
the cross section of a bundle composed by k CNTs, where 
we do not consider the interactions between the different 
tubes such that each eigenvalue is now k times degener- 
ate. If we keep the same search interval used to obtain 
M = 100 eigenpairs for k = 1 (where the size of the ma- 
trices A and B is N), 100/c eigenpairs must now be found 
for k > 1, where each one of them have the multiplicity k. 
In Table HVl we report the simulation times and relative 
residuals obtained using ARPACK and FEAST on these 
fc(N,M) TEST-CNT systems. For the case 8(N,M), for 
example, the size of the new system matrix is 99, 600 
and the first 100 eigenvalues have all the multiplicity 8 
(so 800 eigenpairs are found in total). The simulation re- 
sults show linear scalability performances with the size of 
the system and the number of eigenpairs. In contrast to 
ARPACK where the number of matrix-vector multiplica- 
tions and linear system solves would keep increasing with 
k, the number of operations in FEAST stays the same for 
all these cases. The scalability of the algorithm depends 
then mainly on the scalability of the linear system solver. 



C. Example III 

We have shown that FEAST can re-use the computed 
subspace as suitable initial guess for performing itera- 
tive refinements. This capability can also be of benefit 
to modern applications in science and engineering where 
it is often necessary to solve a series of eigenvalue prob- 
lems that are close one another. In bandstructure cal- 
culations, in particular, many eigenvalue problems of the 
form (A + Sk)xk = AkBxk need to be solved at different 
locations in the k-space (i.e. for different values of k and 
where S is Hermitian with So = I). Let us consider the 
eigenvalue sparse system of size N = 492, 982 obtained for 
a (5,5) metallic CNT using our in-house DFT/real-space 
mesh technique framework for bandstructure calculations 
of nanowires-type structure [IB]. In Figure [H we propose 
to solve this eigenvalue problem using the same search 
interval for the eigenvalues A for different locations of k 
where the subspace computed by FEAST at the point 
k 1 is successively used as initial guess for the neigh- 
boring point k. In addition, the inner linear systems in 
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FIG. 3: Bandstructure calculations of a (5,5) metallic CNT. 
The eigenvalue problems are solved successively for all the k 
points (from T to X), while the computed subspace of size 
Mo = 25 at the point k is used as initial guess for the point 
k+ 1. The number of eigenvalues found ranges from 13 to 20, 
and by the third k point, the FEAST convergence is obtained 
using only one refinement loop. The convergence is obtained 
with the relative error on trace of the eigenvalues smaller or 
equal to 10 _s , while the inner linear systems are solved us- 
ing an iterative method with an accuracy of 10 -3 . The final 
relative residuals on the eigenpairs range from 10" 3 to 10" 5 . 



TABLE IV: Simulations times and relative residual 
maxi(||Axi — A|Bxi||i/||Axi||i), obtained by the solver 
ARPACK and FEAST on the fe(N, M) TEST-CNT systems 
which artificially reproduce k times the original TEST-CNT 
system. The kM (lowest) eigenpairs are found where each 
eigenvalue has a multiplicity of k. 



FEAST are solved using an iterative method with precon- 
ditioner where a modest relative residual of 10 -3 is used 
(e.g. a suitable banded preconditioner can be obtained 
using a mode approach Q). It should be noted that the 
convergence criteria for the relative error on the trace 
of the eigenvalues is chosen much smaller at 10~ 8 , while 
the eigenvectors are expected to be obtained within the 
same (or a smaller) order of accuracy that the one used 
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for the solutions of the inner systems. Figure [3] shows 
that 13 to 20 eigenvalues (i.e. energies) are found within 
the selected search interval along the different k points 
(from the T to the X point in the graph). Although the 
size of the subspace stays identical at Mo = 25, after the 
first initial point at k = (r point in the graph) FEAST 
converges within only one refinement loop for almost all 
the other k points. 

V. DISCUSSIONS 

In comparison to iterative Krylov subspace techniques, 
FEAST can be cast as a "direct" technique which is 
based on an exact mathematical derivation (J3J. FEAST 
does naturally then capture all the multiplicity and no- 
orthogonalization procedure is necessary (such as Gram- 
Schmidt orthogonalization process). As described above, 
the main computational tasks in FEAST consist of solv- 
ing N e independent linear systems along the contour 
with Mo right-hand-sides and a reduced dense general- 
ized eigenvalue problem of size Mo. Since FEAST has 
the ability to re-use the basis from the previously com- 
puted subspace, an outer-iterative refinement procedure 
is proposed to improve the accuracy of the solutions. The 
capability to take advantage of suitable initial guess can 
also be of benefit to modern applications in sciences and 
engineering where it is often necessary to solve a series 
of eigenvalue problems that are close one another (e.g. 
Bandstructure calculations in Example III Section llV|) . 

In one sense, the difficulty of solving an eigenvalue 
problem has been replaced by the difficulty of solving 
a linear system with multiple right-hand sides. For large 
sparse systems, this latter can be solved using either a 
direct system solver such as PARDISO [l5[ (as proposed 
in Section UV|) . or an iterative system solver with precon- 
ditioner. In turn, for banded systems or banded precon- 
ditioner, FEAST can be seen as an outer-layer for the 
author's SPIKE parallel system solver [l7[. It should be 
noted that the inner linear systems arising from stan- 
dard eigenvalue solvers (using the shift-strategy), need 
often to be solved highly accurately via direct methods. 
Direct system solvers, however, are not always suited 
for addressing large-scale modern applications because of 
memory requirements. In Example III of Section lIVI we 
have shown that FEAST can take advantage of iterative 
solvers for solving the inner linear systems with modest 



relative residual and obtaining the eigenvectors solution 
within the same order of accuracy. The resulting sub- 
space could also be used as a very good initial guess for a 
one step more accurate refinement procedure (i.e. using 
more accurate relative residual for the inner systems). 

FEAST exhibits important potentialities for paral- 
lelism at three different levels: (i) many search interval 
[Amin,A max ] can be run independently, (ii) each linear 
systems can be solved simultaneously (e.g. on each node 
of parallel architecture where the factorization of the lin- 
ear system can be done only once for all the refinement 
loops), (hi) the linear system solver can be parallel (e.g. 
within a given node as in Section IIV[) . Depending on 
the parallel architecture at hand, the local memory of 
a given node and the properties of the matrices of the 
eigenvalue problems, one may preferably select one par- 
allel option among the others, or just take advantage of 
a combination of those. In particular, there will be a 
trade off between how many search intervals to consider 
and how many eigenpairs FEAST can handle by inter- 
vals. For example if Mo is more than few thousands, one 
could either (i) solve the obtained reduced system of size 
Mo using efficient dense parallel symmetric eigenvalue 
solvers [18j . or (ii) propose to divide the initial search 
interval into two or more to be processed in parallel. In 
addition, it should be noted that the orthonormalization 
step is absent from FEAST which will drastically reduce 
the communication overhead for performing scalar prod- 
ucts on high-end parallel architectures (the scalar prod- 
uct in step-3 in Fig. [2] has to be done only once per 
iterative refinement). Given the recent advances in par- 
allel architectures and parallel linear system solvers, it 
is reasonable to envision using FEAST in a near future 
for obtaining up to millions of eigenpairs of large sparse 
symmetric eigenvalue problems. Finally the capabilities 
of FEAST could potentially be enhanced for addressing 
non-symmetric eigenvalue problems where the contour 
integration would then be performed in a given region of 
the complex space. 
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